OcrV1, Main, Exploration, bibRecord, 000822

Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

Identifieur interne : 000822 ( Main/Exploration ); précédent : 000821; suivant : 000823

Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

Auteurs : Yan Gao [États-Unis] ; Ming Yang [États-Unis] ; Alok Choudhary [États-Unis]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2009.

RBID : ISTEX:76B575CDBF69BDF5256683067175D726D5D4889A

Abstract

Abstract: Image spam is a new trend in the family of email spams. The new image spams employ a variety of image processing technologies to create random noises. In this paper, we propose a semi-supervised approach, regularized discriminant EM algorithm (RDEM), to detect image spam emails, which leverages small amount of labeled data and large amount of unlabeled data for identifying spams and training a classification model simultaneously. Compared with fully supervised learning algorithms, the semi-supervised learning algorithm is more suitedin adversary classification problems, because the spammers are actively protecting their work by constantly making changes to circumvent the spam detection. It makes the cost too high for fully supervised learning to frequently collect sufficient labeled data for training. Experimental results demonstrate that our approach achieves 91.66% high detection rate with less than 2.96% false positive rate, meanwhile it significantly reduces the labeling cost.

Url:

https://api.istex.fr/document/76B575CDBF69BDF5256683067175D726D5D4889A/fulltext/pdf

DOI: 10.1007/978-3-642-03348-3_17

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 001517
to stream Istex, to step Curation: 001430
to stream Istex, to step Checkpoint: 000344
to stream Main, to step Merge: 000830
to stream Main, to step Curation: 000822

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach</title>
<author><name sortKey="Gao, Yan" sort="Gao, Yan" uniqKey="Gao Y" first="Yan" last="Gao">Yan Gao</name>
</author>
<author><name sortKey="Yang, Ming" sort="Yang, Ming" uniqKey="Yang M" first="Ming" last="Yang">Ming Yang</name>
</author>
<author><name sortKey="Choudhary, Alok" sort="Choudhary, Alok" uniqKey="Choudhary A" first="Alok" last="Choudhary">Alok Choudhary</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:76B575CDBF69BDF5256683067175D726D5D4889A</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-03348-3_17</idno>
<idno type="url">https://api.istex.fr/document/76B575CDBF69BDF5256683067175D726D5D4889A/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001517</idno>
<idno type="wicri:Area/Istex/Curation">001430</idno>
<idno type="wicri:Area/Istex/Checkpoint">000344</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Gao Y:semi:supervised:image</idno>
<idno type="wicri:Area/Main/Merge">000830</idno>
<idno type="wicri:Area/Main/Curation">000822</idno>
<idno type="wicri:Area/Main/Exploration">000822</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach</title>
<author><name sortKey="Gao, Yan" sort="Gao, Yan" uniqKey="Gao Y" first="Yan" last="Gao">Yan Gao</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of EECS, Northwestern University, Evanston, IL</wicri:regionArea>
<placeName><region type="state">Illinois</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Yang, Ming" sort="Yang, Ming" uniqKey="Yang M" first="Ming" last="Yang">Ming Yang</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>NEC Laboratories America, Cupertino, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
<affiliation><wicri:noCountry code="no comma">E-mail: myang@sv.nec-labs.com</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Choudhary, Alok" sort="Choudhary, Alok" uniqKey="Choudhary A" first="Alok" last="Choudhary">Alok Choudhary</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of EECS, Northwestern University, Evanston, IL</wicri:regionArea>
<placeName><region type="state">Illinois</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">76B575CDBF69BDF5256683067175D726D5D4889A</idno>
<idno type="DOI">10.1007/978-3-642-03348-3_17</idno>
<idno type="ChapterID">17</idno>
<idno type="ChapterID">Chap17</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Image spam is a new trend in the family of email spams. The new image spams employ a variety of image processing technologies to create random noises. In this paper, we propose a semi-supervised approach, regularized discriminant EM algorithm (RDEM), to detect image spam emails, which leverages small amount of labeled data and large amount of unlabeled data for identifying spams and training a classification model simultaneously. Compared with fully supervised learning algorithms, the semi-supervised learning algorithm is more suitedin adversary classification problems, because the spammers are actively protecting their work by constantly making changes to circumvent the spam detection. It makes the cost too high for fully supervised learning to frequently collect sufficient labeled data for training. Experimental results demonstrate that our approach achieves 91.66% high detection rate with less than 2.96% false positive rate, meanwhile it significantly reduces the labeling cost.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Californie</li>
<li>Illinois</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Illinois"><name sortKey="Gao, Yan" sort="Gao, Yan" uniqKey="Gao Y" first="Yan" last="Gao">Yan Gao</name>
</region>
<name sortKey="Choudhary, Alok" sort="Choudhary, Alok" uniqKey="Choudhary A" first="Alok" last="Choudhary">Alok Choudhary</name>
<name sortKey="Choudhary, Alok" sort="Choudhary, Alok" uniqKey="Choudhary A" first="Alok" last="Choudhary">Alok Choudhary</name>
<name sortKey="Gao, Yan" sort="Gao, Yan" uniqKey="Gao Y" first="Yan" last="Gao">Yan Gao</name>
<name sortKey="Yang, Ming" sort="Yang, Ming" uniqKey="Yang M" first="Ming" last="Yang">Ming Yang</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000822 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000822 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:76B575CDBF69BDF5256683067175D726D5D4889A
   |texte=   Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri